Lagrange policy gradient
نویسندگان
چکیده
Most algorithms for reinforcement learning work by estimating action-value functions. Here we present a method that uses Lagrange multipliers, the costate equation, and multilayer neural networks to compute policy gradients. We show that this method can find solutions to time-optimal control problems, driving linear mechanical systems quickly to a target configuration. On these tasks its performance is comparable to that of deep deterministic policy gradient, a recent action-value method.
منابع مشابه
An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagr...
متن کاملRisk-Constrained Reinforcement Learning with Percentile Risk Criteria
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented v...
متن کاملThe Linear Nonconvex Generalized Gradient and Lagrange Multipliers
A Lagrange multiplierrules that uses small generalized gradients is introduced. It includes both inequality and set constraints. The generalized gradient is the linear generalized gradient. It is smaller than the generalized gradients of Clarke and Mordukhovich but retains much of their nice calculus. Its convex hull is the generalized gradient of Michel and Penot if a function is Lipschitz. Th...
متن کاملLagrange Multipliers for Nonconvex Generalized Gradients with Equality, Inequality and Set Constraints
A Lagrange multiplier rule for nite dimensional Lipschitz problems is proven that uses a nonconvex generalized gradient. This result uses either both the linear generalized gradient and the generalized gradient of Mordukhovich or the linear generalized gradient and a qualiication condition involving the pseudo-Lipschitz behavior of the feasible set under perturbations. The optimization problem ...
متن کاملA mu - differentiable Lagrange multiplier rule ∗
We present some properties of the gradient of a mu-differentiable function. The Method of Lagrange Multipliers for mu-differentiable functions is then exemplified.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1711.05817 شماره
صفحات -
تاریخ انتشار 2017